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Abstract — Gaussian mixtures are a common density represen- 
tation in nonlinear, non-Gaussian Bayesian state estimation. Se- 
lecting an appropriate number of Gaussian components, however, 
is difficult as one has to trade of computational complexity against 
estimation accuracy. In this paper, an adaptive Gaussian mixture 
filter based on statistical linearization is proposed. Depending on 
the nonlinearity of the considered estimation problem, this filter 
dynamically increases the number of components via splitting. 
For this purpose, a measure is introduced that allows for quan- 
tifying the locally induced linearization error at each Gaussian 
mixture component. The deviation between the nonlinear and 
the linearized state space model is evaluated for determining the 
splitting direction. The proposed approach is not restricted to 
a specific statistical linearization method. Simulations show the 
superior estimation performance compared to related approaches 
and common filtering algorithms. 

Keywords: Bayesian estimation, nonlinear filtering, statis- 
tical linearization, Kalman filtering, Gaussian mixtures. 

I. Introduction 

Bayesian state estimation for nonlinear systems requires an 
efficient approximation for practical applications as closed- 
form solutions are not available in general. A common ap- 
proximation technique is the discretization of the state space 
as done in grid filters or particle filters 1 1 1 . Theoretically, these 
techniques facilitate to approach the true statistics of the state 
with arbitrary accuracy. But they are only applicable to low- 
dimensional problems since their computational complexity 
increases exponentially with the dimension of the state space. 

A famous exception that exhibits an analytic solution is 
the linear Gaussian case. Here, the famous Kalman filter 
provides optimal results in an efficient manner p). So-called 
Gaussian filters try to adapted the Kalman filter equations 
to nonlinear problems by assuming that the density function 
of the state can be represented by a Gaussian density. The 
extended Kalman filter |3| applies first-order Taylor series 
expansion for linearization. The unscented Kalman filter Q, 
(|5) or the Gaussian estimator Q offer higher order accuracy 
by employing statistical linearization. But in general a single 
Gaussian density is typically not a sufficient representation for 
the true density function, which may be skew or multimodal. 
Thanks to their universal approximator property, Gaussian 
mixtures f?! are a much better approach for approximating 
complex density functions. Examples for Gaussian mixture 
filters applied to nonlinear estimation are in fSl, ||9l. 



The estimation accuracy of Gaussian mixture filters signif- 
icantly depend on the number of Gaussian components used. 
This number is typically defined by the user In this paper, 
a novel Gaussian mixture filter is proposed, which adapts the 
number of components dynamically and on-line. The nonlinear 
system and measurement models are linearized locally by 
means of statistical linearization at each component of the 
Gaussian mixture. The induced linearization error is quantified 
by means of the linearization error covariance matrix. Based 
on this error, a novel moment-preserving splitting procedure is 
proposed for introducing new mixture components. The com- 
ponent causing the highest Unearization error is selected, while 
splitting is performed in direction of the strongest nonlinearity, 
i.e., the strongest deviation between the nonlinear model and 
its linearized version. Both linearization and splitting are 
independent of the used statistical linearization method, which 
makes the proposed filter versatilely applicable. 

The paper is structured as follows: The Bayesian state 
estimation problem is formulated in the next section. In 



Sec. Ill a brief introduction in statistical linearization is given. 



The novel splitting scheme is derived in Sec. IV Based on 
this. Sec. M describes the complete adaptive Gaussian mixture 
filter with all major components. Numerical evaluation by 
means of simulations is part of Sec. [Vl] The paper closes 
with concluding remarks. 

II. Problem Formulation 
In this paper, discrete-time nonlinear dynamic systems 



^k = Uik\^k->'^k) 



(1) 

(2) 



are considered. Here, ([T]) is the dynamics model with the 
known time- variant nonlinear system function aj,(), which 
propagates the system stataM x^, e R"^- at time step k to 
time step fc + 1, given the current system input u^ G E,"" 
and the process noise w;^. G R""' . The measurement model is 
given by (|2]l, where Z^a; ( ' ) is the known time-variant nonlinear 
measurement function, ^j. e R"= is the measurement vector, 
and y_^. G R"" is the measurement noise. Note that an actual 
measurement value z^, is a realization of the random vector 



^k 



•H- 



Random vectors ai'e denoted by boldface letters. 



Both noise processes Wf. and Vf. are assumed to be in- 
dependent and white. The probability density functions of 
Wf. and Vk are denoted by /^(u^j.) and /^(w^), respectively. 
It is assumed that these density functions are described via 
Gaussian mixtures 



/rK) = E<^--^(^fc'^M'Cy , 






/fc-K) = 5^<.-AAK;iM'CL), 



(3) 



(4) 



where L^, L| are the numbers of mixture components, 
w™ ■, w^ j are non-negative weights that sum up to one, and 
A/'(w;w, C™) is a Gaussian density with mean vector w and 
covariance matrix C™. The initial density function /^(xq) of 
the system state at time step fc = is also assumed to be given 
as a Gaussian mixture. 

Estimating the system state from noisy measurements is 
done according to the Bayesian framework. Here, two steps 
are performed alternately, namely the prediction step and the 
filtering step. In the prediction step, the density f^{xf.) := 
fki^klyiO-.kT^o-.k) of the previous filtering step is propagated 
to the next time step according to 

/fc+l(aifc + l) '■— Jk+li^k+lliLo-.kT ^0:k) 

f{xk+i\xk,u,„Wk) ■fkixk)-fkiwk)dXkdwk, (5) 

where z^.f^ = {zq,z^, . . . ,Zf.) denotes the measurements up to 
and including time step k, f {x|^_^_-^^\x|^, Uj^, Wi.) is the transition 
density depending on the dynamics model ([TJ, and (5( • ) is the 
Dirac delta distribution. 

The filtering step determines the posterior density ff,{xi.) of 
the system state Xj, based on all acquired measurement values 
according to Bayes' law 

fki^k) - Ck ■ /Ufckfe) • flixk) , 

where Ck is a normalization constant and f{zj.\xi.) is the 
likelihood function given by 



/Ufckfe)= / Kzk-hk{xk,Vk))- fk{v.k)<^v-k 

and the measurement model (|2|. 

In general, for arbitrary nonlinear systems with arbitrarily 
distributed random vectors, there exist no analytical solutions 
of the prediction step and filtering step. Thus, for efficient 
estimation, it is inevitable to apply an approximate solution. 
In the following, an adaptive approximation scheme is pro- 
posed, where the predicted and posterior state densities are 
represented by means of Gaussian mixtures 



fki^k) 



L'k 

E 



w*, ■A/'(xj.;i*,,C*,) , • e {e,p} (6) 



where the number L* of mixture components is variable and 
adapted on-line by the proposed Gaussian mixture filter. 



III. Statistical Linearization 

Substituting the Gaussian mixtures representing the noise 
and the state density into the prediction step and the filtering 
step, it can be easily seen that estimation can be performed 
component-wise. For example in case of the prediction, using 
(|3]l and (|6| with • = e in (|5]l gives rise to 



/fc+ifefe+i) — / . 7 . 



Wt 



■UJ 



fcj ■ 



1=1 j=l 



/fefc+ll^fc:Mfe,Wfc)' 



A/'(xfe;x^^,,Cfc^,)-7V(wfc;w;fej,C^j)dxfcdwfcj . (7) 

Thus, it is sufficient to focus in the following on the simplified 
nonlinear transformation 



y = 9{x) 



(8) 



which maps the Gaussian random vector x with density 
Af{x;x, C^) to the random vector y. This nonlinear transfor- 
mation can be replaced by a^, ( • ) in the prediction step and by 
hki' ) ill '^he filtering step, while the Gaussian random vector 
a; in ([8]) represents the joint Gaussian of the state and noise. 

A. Classical Linearization 

Calculating the density or the statistics of y cannot be car- 
ried out in closed form. Hence, directly processing the density 
or the moments is computationally demanding and imprecise, 
or even impossible. An exception are linear transformations, 
where the Kalman filter [2] provides analytic expressions of 
the Bayesian estimation problem. To apply the Kalman filter 
equations to nonlinear transformations, a typical way is to 
linearize the nonlinear transformation, which results in the ex- 
tended Kalman filter ||3). Here, it is assumed that the nonlinear 
transformation can be approximated by a linear transformation 
through a first-order Taylor series expansion around the mean 
X. In case of mild nonlinearities the linearization error of 
this approximation is acceptable. However, for this type of 
linearization the spread of x, i.e., the covariance matrix C^ is 
not taken into account and there is no measure which allows 
to quantify the linearization error. 

B. Statistical Linear Regression 

To overcome these flaws, deterministic sampling techniques 
are employed instead, which allow for propagating the mean 
and the covariance of x through the nonlinear transformation 
(|8]l. In doing so, linearizing the transformation by so-called sta- 
tistical linear regression or statistical linearization is possible 



1 10 1, 1 11 1. More precisely, statistical linearization calculates a 
matrix G and a vector b such that 



y — .9(-^) ^ G ■ x + b , 
where the error term 

e = g{x) — G • ^ + 6 



(9) 



(10) 



describes the deviation of the nonlinear transformation and its 
linear approximation. To determine G and b, the nonlinear 



transformation g{-) is evaluated at a set of weighted regres- 
sion points {ai,x^}i=i,,,L with non-negative weights a.; with 
J2i cti = 1' which resuhs in points y. — g{x^) for i — 1 . . . L. 
This set of points is chosen in such a way that the mean x 
and covariance C^ of x are captured exactly, that is 

L 

X — y ai ■ Xi and C^ = >^ ai ■ {x^ — x) ■ (x^ — x) 



L 

/ J ^i ' -S-i 



Then G and b are determined by minimizing the weighted 
sum of squared errors 



{G, &} = arg min \y^a^■§^ ■ e, 



(11) 



with gj = y. — (G • Xj + 6). The solution of ( fTTj ) is given by 

G = [C'^yf {C^y^ a.nd h = y_-G-x, (12) 

where the set of propagated points {ai, y }i=i...L is used to 
approximate the mean, covariance, and cross-covariance of y 
according to 

L 



L 



, C^ « ^ a, • {y^ - y) ■ {y^ - yf , 



i=l 

The linearization error is characterized by the error term ( [T0| 
and has zero-mean and the covariance matrix 



C^ GC^^G 



xr^T 



(13) 



Thus, by means of the covariance C^ it is possible to quantify 
the linearization error If C*^ is a zero matrix, the density of 
the error e corresponds to a Dirac delta distribution | |12J and 
the transformation g{-) is affine with g{x) = G • ^ + 6. 

C. Calculating the Regression Points 

Many approaches for calculating the set of regression points 
have been proposed in the recent years. They differ in the 
number of regression points L and the way these points are 
chosen. In the following example, both selection schemes used 
in this paper are briefly introduced. 

Example 1 In the simulations described in Sec.[Vi]the famous 
unscented transform \5\ and the Gaussian estimator |6] are con- 
sidered. For both, the calculation of the sigma points x- e R"^ 
can be summarized as 



^j — ^ , 



X- ^ X + Vj ■ Pi 



= l + l + {j-l)-n^ 



where Pj, / = 1 . . . n^, is the Ith column of the matrix P = Vc^ 
and Uj, j ^ I. ..N are scaling factors. This results in a number 
of L = n^ • TV + 1 regression points. The type of the matrix root, 
the scaling factors, and the weights a, of the regression points 
depend on the considered selection scheme. 

In case of the unscented transform, the Cholesky decomposi- 
tion is chos en as m atrix root. For the scaling factors holds N = 2 
and ui = ^n^ + K = -!/2, where k is a scaling parameter. The 



The Gaussian estimator utilizes the eigenvalue decompo- 
sition for calculating the matrix root. The number of scaling 
factors TV and thus the total number L of regression points 
can be varied. Since the scaling factors result from solving a 
optimization problem, there is no closed-form expression. For 
TV = 2 and TV = 4, the scaling factors can be calculated to 

TV = 2 ; Uj € {-1.2245, 1.2245} , 

N ^4: Uj € {-1.4795, -0.5578, 0.5578, 1.4795} . (14) 

The regression points are equally weighted with Vi : a,; = |; . 

It is important to note that the proposed adaptive Gaussian 
mixture filter is not restricted to these two selection schemes. 
In fact, any selection scheme for statistical linearization in- 
cluding those described in tl3J-L15J can be used, depending 
on the considered application as well as the desired estimation 
performance and computational demand. 

IV. Splitting Scheme 

Given a random vector x, whose density function /^ (x) is 
a Gaussian mixture with i^ components according to (|6]l, it 
is possible to linearize the nonlinear transformation g{-) for 
each component of f^{x). This kind of component-wise or 
local linearization leads to an improved approximation of the 
true density function /^ (y) of y compared to a single, global 
linearization. To further improve the approximation, especially 
in case of strong nonlinearities and/or large variances of some 
components, the idea is to select a component of f^{x) and 
split it into several components with reduced weights and 
covariances. It was demonstrated for example in |8 1 that the fil- 
tering accuracy of local linearization approaches benefits from 
this decrease of the covariances and simultaneous increase of 
the number of Gaussians. 

A. Component Selection 

A straightforward way to select a Gaussian component 
for splitting is to consider the weights wf, i ~ l...L^. 
The component with the highest weight is then split. This 
however does not take the nonlinearity of g{-) in the support 
of the selected component into account. Since linearization 
is performed component-wise and locally, a more reasonable 
selection would be to consider also the induced lineariza- 
tion error of each component. For this purpose, statistical 
linearization akeady provides an appropriate measure for the 
linearization error in form of the covariance matrix C^ in ( [T3] l. 

In order to easily assess the linearization error in the multi- 
dimensional case, the trace operator is applied to C^, which 
gives the measure 



e = trace (C) e [0,oo) . 



(15) 



weights are ai = 



and ai = 



2{n^ + K) 



for i > 1. 



Geometrically speaking, the trace is proportional to circum- 
ference of the covariance ellipsoid corresponding to C^. The 
larger C"^ and thus the linearization error, the larger is e. 
Conversely, the trace is zero, if and only if C is the zero 
matrix, i.e., e = <^ C^ = 0. Hence, ([TSj is only zero, 
when there is no linearization error, that is, the nonlinear 
transformation g( • ) is affine in the support of the considered 
Gaussian component. 



Besides the linearization error, the contribution of a com- 
ponent to the nonhnear transformation is important as well. 
That is, the probability mass of the component, which is given 
by its weight ujf, has also to be taken into account. This 
avoids splitting irrelevant components. Putting all together the 
criterion for selecting a component i for splitting is defined as 

s, = {ujfr ■ (1 - exp (-q))'"^ G [0, 1] (16) 

for i = 1 . . . L^, where 1 — exp (— e^) normalizes the lin- 
earization measure ([TSj into the interval [0, 1]. For a geometric 
interpolation between weight and linearization error of compo- 
nent i, the parameter 76 [0,1] used. With 7 = 0, selecting a 
component for spUtting only focuses on the linearization error, 
while 7 = 1 considers the weight only. 

Component selection criteria for splitting have also been 
proposed in p6| , fTT) . The criterion in p6) is designed for the 
unscented transform only, while the criterion in fTT| can only 
be calculated analytically in some special cases. The proposed 
criterion instead is generally applicable. 

B. Splitting a Gaussian 

Assume that according to the selection criterion ( fT6| ), the 
Gaussian component uj ■M{x;,x, C^) is chosen. Splitting this 
Gaussian into many can be formulated as replacing the Gaus- 
sian by a Gaussian mixture according to 



uj ■ Af{x-, i, C^) w ^ Ljj ■ Af{x; x^ , Cj) 






(17) 



It can be easily verified that for L > I, the number of 
free parameters, i.e., weights, mean vectors and covariance 
matrices, is larger than the number of given parameters. More 
precisely, splitting a Gaussian is an ill-posed problem. In order 
to reduce the degrees of freedom and to not introduce errors 
concerning the mean and covariance, splitting is performed in 
a moment-preserving fashion. Thus, it must hold that 

L L 



= E^J- ' ^=E^ 
L 



(18) 



^^ = E^-( 



^ ? ' -^i ^ 



-T 
■J -J 



i=i 



To further simplify the problem, splitting is restricted in 
direction of the eigenvectors of C^, which is computationally 
cheap and numerically stable. Furthermore, it reduces the 
problem to splitting a univariate standard Gaussian. 

1) Univariate standard Gaussian: A moment-preserving 
split of a univariate standard Gaussian J\f{x; 0, 1) into a 
mixture with L components requires to determine 3 • L free 
parameters. By forcing symmetry, i.e., the means Xj are placed 
symmetrically around the mean x with symmetrically chosen 
weights and for the variances holds Vj : a^ = a, the number of 
free parameters is reduced to i + 1. In (18\, a splitting library 
with symmetric components is proposed. Unfortunately, pre- 
serving the moments is not guaranteed. Instead, the following 
split into two components is used throughout this paper 



Example 2 Following the approach proposed in (T9), the 
univariate standard Gaussian is split into the mixture 

I ■JV{x;x,a'^) + \ -Mix; -x,a^). The moment-preserving con- 
straints of splitting {18) lead to the dependency o-^ = 1 - i^ 
between x and a, where x is now the only free parameter. 
This equation is valid for x e [-1,1] and contains the trivial 
solution a; = 0. Generally, x may be determined dynamically by 
minimizing the resulting linearization error. But throughout this 
paper, x is set to 0.5 for simplicity. 

To determine the parameters of more than two components, 
additional constraints, e.g., capturing higher order moments 
like the skewness or the kurtosis have to be considered 
additionally. Since splitting is performed recursively in this 
paper (see Sec.|V]l, the new introduced components can be split 
in the subsequent splitting step if the local linearization error 
may not be reduced sufficiently. Splitting into two components 
is a good compromise between reducing the linearization error 
on the one hand and controlling the growth of the number of 
components and the computational load on the other hand. 

2) Multivariate Gaussian: Applying univariate splitting to 
the multivariate case requires the eigenvalue decomposition 
of the covariance matrix C^ — VDV^, with V being the 
matrix of eigenvectors and D being the diagonal matrix of 
eigenvalues according to 



V=[«i 



J , D = diag(Ai,A2,...,A„J 



where v^ G R"^ are the (orthonormal) eigenvectors and Ai 
are the eigenvalues. V is a rotation matrix and the eigenvalue 
decomposition of C^ corresponds to the transformation 



of a Gaussian random vector z with density 



(19) 



(20) 



to a Gaussian random vector x with density Af{x;x,C^). 
Since the Gaussian f^{z) has a diagonal covariance matrix, 
the eigenvectors are parallel to the axes of the coordinate 
system. Thus, univariate splitting can be easily applied along 
the eigenvectors by replacing a univariate Gaussian on the 
right-hand side of pO] ) by a Gaussian mixture. 

Assuming that eigenvector W; is chosen for splitting and let 
J2i=i ^'i '-^{^V^ •^ji"'?) bs the Gaussian mixture that approx- 
imates a univariate standard Gaussian as described above. As 
this mixture approximates a standard Gaussian, its components 
have to be shifted by adding zi and scaled by multiplying 
with \fXi in order to match the mean zi and the variance A;, 
respectively. These operations result in 

L 



N{zi-zi,\i)K^Jj-N[zi-zi + ^iz'^,\i(y'^^ . (21) 

Plugging (J2T]l into (|20| leads to 

7V(z;l,D) w 

L n^ 

^w^. •7V(z/;z, + v/aIz^-,A/ct|) • ]^7V(z,; i„ A^) . 

7 = 1 i=l 

1=^1 



Transforming this mixture via ([T9| gives the desired splitting 
result ( [TT] ) with the weights, means, and covariance matrices 



W -UJa 



X + \A\ ■ z'. ■ V, 



(22) 



C^ + Xri<7^-l)-v,v] 



for j = 1 . . . L. 

It is worth mentioning that the calculation of the parameters 
in ( p2] l is independent of the number of components L and 
does not necessarily require a symmetric, moment-preserving 
splitting. Thus, arbitrary splitting methods of univariate stan- 
dard Gaussians besides those described in this paper, can be 
used with these formulae. 

C. Splitting Direction 

So far, no criterion for selecting an appropriate eigenvector 
for splitting is defined. A straightforward criterion may be the 
eigenvector with the largest eigenvalue as in fTS), | |T9| . But 
since ( [T6| ) determines the Gaussian component that causes the 
largest linearization error, merely splitting along the eigenvec- 
tor with the largest eigenvalue does not take this error into 
account. 

The key idea of the proposed criterion is to evaluate the 
deviation between the nonlinear transformation ([8| and its 
linearized version (|9]) along each eigenvector. The eigenvector 
with the largest deviation is then considered for splitting, i.e., 
the Gaussian is split in direction of the largest deviation in 
order to cover this direction with more Gaussians, which will 
reduce the error in subsequent linearization steps. 

By means of the error term ( [TO] i, the desired criterion for 
the splitting direction is defined as 

dr-= I e{xi{v))^-e{xi{u))-M{xi{v)-x,C-)dv (23) 



with Xi{v) '■= X + V ■ y_i, I = 1 . . . rix, and W; being the lih 



eigenvector C^. The integral in (j23]l cumulates the squared 
deviations along the l\h eigenvector under the consideration 
of the probability at each point Xi{v). The eigenvector that 
maximizes ( |23| ) is then chosen for splitting. Unfortunately, 
due to the nonlinear transformation g{-) in ( [TO] i, this integral 
cannot be solved in closed-form in general. For an efficient 
and approximate solution, the regression point calculation 



schemes described in Sec. III-C are employed to approximate 
the Gaussian in ( |23] l in direction of W; by means of a Dirac 
mixture. This automatically leads to a discretization of the 
integral at a few but carefully chosen points. 

V. Adaptive Gaussian Mixture Filter 
Based on the statistical linearization described in Sec. |lll] 



and the splitting procedure proposed in Sec. IV the complete 



adaptive Gaussian mixture filter (AGMF) is now derived. The 
key idea of AGMF is to dynamically increase the number of 
Gaussians of a given mixture at regions with large linearization 
errors. The number is reduced after each prediction and filter- 
ing in order to limit the computational and memory demand. 
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Jk 
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Figure 1 . Flow chart of the proposed adaptive Gaussian mixture filter. Both 
the prediction and filtering step employ splitting and reduction for adapting 
the number of mixture components. 



A. Prediction Step 

The major operations to be performed in the prediction 
step are illustrated in the upper part of Fig. [T] The following 
paragraphs provide detailed descriptions of these operations. 

1) Linearization: As shown in ^ the prediction step 
can be performed component-wise. Therefore, the nonlinear 
system function is linearized statistically at the weighted joint 
Gaussian u}k,s ■■^iKk'y^k,s^^k s) ^^^ =^fe 



[^fc>«^I]^' 



ii-l)-Lf+j^l. 



^k.s 



^-k,j. 



■ ^k ■ ^fe ' ^k,. 



^k,s 



'k.j 



and 



^k,j 



With ( [T2| ), the linearization results in 



ifc+l 



= [Ak,s Bfe„,] ■X. + b, 



(24) 



= Gfc 



2) Splitting: Due to the nonlinearity of the sys- 
tem function af^{-), some of the mixture components 
Wfe,s ■ ■N' {X_i.\ X_f. 5,C^g) may locally cause severe lineariza- 
tion errors. These errors are quantified by means of the selec- 
tion criterion ( fTS) . The component maximizing ( [T6| ) will be 
split in direction of the largest deviation between the nonlinear 
function a^, ( • ) and its linearized version (|24li as described 
in Sec. 



IV-C 



After splitting this Gaussian, linearization is 
performed for the newly introduced mixture components. The 
linearization need not to be repeated for the remaining mixture 
components as they are not affected by the splitting. 

Splitting Gaussians and the subsequent linearization is re- 
peated until a stopping criterion is satisfied. This stopping 
criterion combines three user-defined thresholds: 

1) For each component the value of the selection criterion 
([16]) shall drop below the error threshold emax G [0, 1]. 

2) The number of Gaussians shall not grow beyond the 
component threshold imax- 

3) The deviation between the original Gaussian mixture 
/(x) and the mixture obtained via splitting f{x) shall 
remain below a deviation threshold dmax G [0, 1]. 



In the latter case, the deviation is determined by means of 
the normaUzed integral squared distance measure pO) 



D{f{x),f{x)) 



/(/(^)-/fe))'dx 



e [0, 1] 



/ fUy dx + J f{xY dx 

Since splitting always introduces an approximation error to the 
original mixture f{x), tracking the deviation during splitting 
and keeping the deviation below the threshold rf,„ax avoids 
that errors introduced by splitting neutralize the gain in lin- 
earization. Splitting stops, if at least one threshold is reached. 
3) Prediction: Let oJk.s • J^{X_k\2Lk,s^^k s) ^^ ^^^ Gaus- 
sians resulting from the splitting step, with s = 1 . . . L| 
and Z^ ^ L'j.-L'^. Based on these Gaussians and their 
corresponding locally linearized system models ( |24] i, the pa- 
rameters of each component of the predicted Gaussian mixture 
/fc+ifefc+i) can be calculated by means of the Kalman pre- 
dictor according to 



"k+l,s 
£fc+l,s 



^ ^kj 

Afe„ 



^l+i.s - Afe,sC^ -Aj. 



hBkrs-m^s+tk.s , 



where C^ j is the linearization error covariance ([TSj. 

4) Reduction: The number of components L^ in 
/fc+il^ife+i) grows due to the multiplication of the Gaussian 
mixtures f^{xf.) and f^{w_i^) for prediction and due to 
splitting. It is necessary to bound this growth in order 
to reduce the computational and memory demand of 
subsequent prediction and filtering steps. For this purpose, 
one can exploit the redundancy and similarity of Gaussian 
components. Furthermore, many components will have 
weights close to zero, thus they can be removed without 
introducing significant errors. To reduce a Gaussian mixture, 
many algorithms have been proposed in the recent years (see 
for example p0)-p3|). Most of these algorithms require a 
reduction threshold L^^^-^-typically much smaller than imax- 
to which the number of components of the given Gaussian 
mixture has to be reduced. In the simulations, Runnalls' 
reduction algorithm pT) is employed as it provides a good 
trade-off between computational demand and reduction errors. 

With the reduction to L^ , j components, the calculation of 
the predicted Gaussian mixture fl.j^i{xf.j^i) in (|6]l is finished. 

B. Filtering Step 

The operations to be performed for the filtering step are 
almost identical to the prediction step (see Fig. fT). Thus, 
only linearization and filtering are described in the following. 
Splitting and reduction coincide with the prediction step. 

1) Linearization: Linearization and filtering are also per- 
formed component-wise. Let LOk.s -Afi^Lki^Lk s' ^k's) ^^ ^^ 
joint Gaussian comprising the ith component of the predicted 
mixture f^{xf.) and the jth component of the measurement 
noise mixture Q, where s — {i — 1) ■ L1+ j ~ I . 
The corresponding linearized measurement model is 

with joint state X f. = [^^jtjj] . 



tP . TV 

^k ^k ■ 



(25) 



2) Filtering: Let wj-^j •7V(Xj.; X^,^^, C^-) be the Gaus- 
sians resulting from splitting, with s — I . . . L^, and Z/| ^ 
L^- L^. Given the current measurement value Zj,, the Kalman 
filter update equations applied on these Gaussians and their 
corresponding locally linearized measurement models p5] ) 
give rise to the parameters of each component of the posterior 
Gaussian mixture /|(xj,) 

Cfc-Wfc,s-7V(zfc;Zfc.5,Sfc,5) , 
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(26) 



+ 'Dk,s-Vk,s + 



with predicted measurement Zj^ g 

&J. J, Kalman gain K^^^ = C^gH^-SjTj, innovation co- 

variance S^., = H,,,C^ .HtJ + D,,,C-^,DT^- + C^,,, 

and Cfc_5 being the linearization error covariance ( pjj ). The 

calculation of the weight cj^ ^ in ( |26] l is adapted from |8l, 19), 

where Ck = l/J2s^k,s-J^{zf^',Zf.g,Sk,s) is a normahzation 

constant. 

After the reduction to L^. components, the posterior Gaus- 
sian mixture J^{xi.) in (|6| is completely determined. 

VI. Simulation Results 

Two numerical simulations are conducted in order to 
demonstrate the performance of the proposed AGMF. 

A. Shape Approximation 

In the first simulation, the nonlinear growth process 



V = g{x) = 77 + 5 • 



1 



w 



adapted from |24| is considered, where x ~ [l^^w\^ ^ 
f^{x) = Af{x; [1,0] "^,12) with I„ being the n x n identity 
matrix. To approximate the density of y, the Gaussian f^{x) is 
split recursively into a Gaussian mixture, where the number of 
components is always doubled until a maximum of 64 compo- 
nents is reached. No mixture reduction and no thresholds emax, 
c^max are used. The Gaussian estimator with 4 scaling factors 
according to ([T4]i is employed for statistical linearization. The 
true density of y is calculated via numerical integration. 

Two different values for the parameter 7 of the selection 
criterion ( [T6] l are used: 7 = 0.5, which makes no preference 
between the component weight and the linearization error and 
7 == 1, which considers the weight only. Furthermore, a rather 
simple selection criterion is considered for comparison, where 
selecting a Gaussian for splitting is based on the weights only 
(as it is the case for 7 = 1), while the splitting is performed 
in direction of the eigenvector with the largest eigenvalue. 



Table I 

Approximation error (KLD x 10) for different splitting 

schemes and numbers of components. 



splitting 
scheme 



1 



number of Gaussians 
4 8 16 



max. eigenvalue 

7 = 1 
7 ■■ 



0.5 



2.01 
2.01 
2.01 



0.77 
0.77 
0.77 



0.64 0.47 0.39 
0.59 0.34 0.20 
0.40 0.22 0.07 



32 



0.21 
0.12 
0.03 



64 



0.26 
0.07 
0.02 



0.4 



0.3 
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True 

Approx. L = 1 
Approx. L = 2 

Approx. L = 8 

Approx. L = 32 



0.1 




Figure 2. True density function of y (black, dashed) and approximations 
with an increasing number of mixture components. 



Table [I] shows the Kullback-Leibler divergence (KLD, 125)) 
between the true density of y and the approximations obtained 
by splitting. The approximations of the proposed splitting 
scheme are significantly better than the approximations of the 
largest eigenvalue scheme. This follows from the fact that the 
proposed scheme not only considers the spread of a compo- 
nent. It also takes the linearization errors into account. In doing 
so, the Gaussians are always split along the eigenvector that 
is closest to ^, since this variable is transformed nonlinearly, 
while w is not. This is different for the largest eigenvalue 
scheme, which wastes nearly half of the splits on w. 

The inferior approximation quality for 7=1 compared to 
7 = 0.5 results from splitting components, which may have a 
high importance due to their weight but which do not cause 
severe linearization errors. Thus, splitting these components 
will not improve the approximation quality much. 

In Fig. l2j the approximate density of y is depicted for 
different numbers of mixture components for 7 — 0.5. 
With an increasing number of components, the approximation 
approaches the true density very well. 

B. Object Tracking 

For the second simulation example, a object tracking sce- 
nario is considered. The kinematics of the mobile object are 
modeled by means of the bicycle model 



i/c+l 



Xk+1 




"cos(0fe)' 


Vk+i 


= ^fc + 


sin(0fe) 


.^fc+i. 
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where the system state x^. comprises the position [xkjyk]"^ 
and the orientation (/>j, of the bicycle. At time step fc = 0, the 
initial estimate of the state Xq is represented by a Gaussian 
density with mean Xg = [100 m, 100 m, rad]"'" and covari- 
ance matrix Cg = diag([10^, 10^, tt^]). The system input 
Uk := tan(afe) with a^ being the steering angle, is chosen ran- 
domly and uniformly distributed from the interval [—0.2, 0.2] 
at each time step. The system noise w^. is zero-mean Gaussian 
with covariance matrix C™ = diag(0.1^, 0.1^, 0.01^). 
A radar sensor with measurement model 



^fe 



\/x 
arctan {y^^/xk) 



Vl 



' ^k 



is employed for observing the object, where the measurement 
noise i;^. is modeled as unimodal glint noise f2E\ with density 
Fkivk) = {l-(i)-N{v^;Q,Cl) + p-N{vk;Q,Cl) with 
covai-iances C^ ^^ = diag(l2,0.P) and C^ 2 = diag(2^0.22). 
The parameter (5 refers to the glint noise probability. Six prob- 
ability values /3 — {0, 0.2, . . . , 1} are exploited for simulation. 
By increasing /3 it is possible to investigate the performance 
of the filters for stronger noise, which is also heavily tailed 
for /? 7^ and fi^l. 

For this simulation setup, AGMF is applied with parameter 
7 — 0.5, error threshold e,„ax = 0.05, deviation threshold 
rfmax — Ij and component threshold Lmax = 128 for both 
prediction and filtering. Three values of reduction thresholds 
are used, L^ — Lf. = 2, 8, 32. For comparison, a Gaussian 
mixture filter (denoted as MWE) employing the simple largest- 
weight-largest-eigenvalue-criterion as described in the previ- 
ous section is considered. Further, the adaptive level of detail 
(ALD) Gaussian mixture filter proposed in [161 is employed as 
well. Since ALD is only designed for the unscented transform 
(see Example [Til, this statistical linearization method is also 
used for AGMF to allow a fair comparison. The scaling 
parameter k of the unscented transform is set to 0.5, i.e., all 
regression points are equally weighted. MWE and ALD use 
the same parameters as AGMF, except that MWE always splits 
until Lniax is reached since it exploits no linearization errors. 

Besides these Gaussian mixture filters, a particle filter (PF) 
with residual resampling and 10, 000 samples as well as the 
unscented Kalman filter (UKF, Q) with k = 0.5 are also 
applied. 

For each glint probability and each reduction threshold, 50 
Monte Carlo simulation runs are performed, where the object 
is observed for 100 time steps. In Fig. [3](a), the average root 
means square error (rmse) of the position and the average 
runtime per simulation run are depicted. The AGMFs with 8 
and 32 components provide the best tracking performance. The 
PF is close to AGMF, but with a significantly higher runtime. 
Conversely, the UKF is by far the fastest algorithm, but leads 
to diverging estimates. 

The splitting criterion used for ALD selects components that 
exhibit a high degree of nonlinearity. But splitting is performed 
merely in direction of the largest eigenvalue. This explains the 
relative poor tracking performance of ALD. 

Even if MWE is allowed to split until Lmax is reached, the 
performance of MWE is always inferior to AGMF. This is 
due to wasting many splits, e.g., in the prediction step only 
one quarter and less of the splits is used for </>;,, which is 
the only nonlinearly transformed variable. Here, AGMF is 
much more effective thanks to the novel splitting criterion. 
Besides splitting mainly in direction of the nonlinearity, it 
does not require all available splits as shown in Fig. [3](b). The 
maximum number of splits is L^ax — L^ in the prediction step 
and analogously in the filtering step. But at most 40 splits are 
performed in case of the strongest noise and when the state 
mixture is reduced to two components. If more components 
are allowed to represent the state density, the number of splits 
decreases as the approximation before splitting is already of 
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Figure 3. (a) Average rmse over all simulation runs and average runtime per simulation run for each (3. (b) Average number of splits performed per prediction 
or filtering step of the AGMF for each /3. 



high quahty. This also reduces the rantime as can be seen 
when comparing for example AGMF 32 with AGMF 2. Here, 
the time consuming splitting operation has to be performed 
less often and the reduction operation has to reduce a mixture 
with an already low number of components. 

VII. Conclusions 

In this paper, a novel adaptive Gaussian mixture filter 
has been proposed. It is based on statistical linearization, 
which allows quantifying the induced linearization errors in 
terms of a linearization error covariance matrix. A criterion 
based on this covariance matrix is used for selecting Gaussian 
components for splitting, while the direction of the split is 
performed in direction of the eigenvalue with the strongest 
linearization errors. Compared to other splitting criteria, the 
proposed one reliably detects strong nonlinearities and keeps 
the number of splits on a low level. Furthermore, arbitrary 
approaches for statistical linearization can be employed. 
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